seo

May 2010 Linkscape Update (and Whiteboard Explanations of How We Do It)

Ali JalilPour June 30, 2024

0 0 3 minutes read

As some of you likely noticed, Linkscape’s index updated today with fresh data crawled over the past 30 days. Rather than simply provide the usual index update statistics, we thought it would be fun to do some whiteboard diagrams of how we make a Linkscape update happen here at the mozplex. We also felt guilty because our camera ate tonight’s WB Friday (but Scott’s working hard to get it up for tomorrow morning).

Rand Writing on the Whiteboard

Linkscape, like most of the major web indices, starts with a seed set of trusted sites from which we crawl outwards to build our index. Over time, we’ve developed more sophisticated methods around crawl selection, but we’re quite similar to Google, in that we crawl the web primarily in decending order of (in our case) mozRank importance.

Step 1 - We Crawl the Web

For those keeping track, this index’s raw data includes:

41,404,250,804 unique URLs/pages
86,691,236 unique root domains

After crawling, we need build indices on which we can process data, metrics and sort orders for our API to access.

Step 2: We Build an Index

When we started building Linkscape in late 2007, early 2008, we quickly realized that the quantity of data would overwhelm nearly every commercial database on the market. Something massive like Oracle may be able to handle the volume, but at an exorbitant price that a startup like SEOmoz couldn’t bear. Thus, we created some unique, internal systems around flat file storage that enable us to hold data, process it and serve it without the financial and engineering burdens of a full database application.

Our next step, once the index is in place, is to calculate our key metrics as well as tabulate the standard sort orders for the API

Step 3: We Conduct Processing

Algorithms like PageRank (and mozRank) are iterative and require a tremendous amount of processing power to compute. We’re able to do this in the cloud, scaling up our need for number-crunching, mozRank-calculating goodness for about a week out of every month, but we’re pretty convinced that in Google’s early days, this was likely a big barrier (and may even have been a big part of the reason the “GoogleDance” only happened once every 30 days).

After processing, we’re ready to push our data out into the SEOmoz API, where it can power our tools and those of our many partners, friends and community members.

Step 4: Push the Data to the API

The API currently serves more than 2 million requests for data each day (and an average request pulls ~10 metrics/pieces of data about a web page or site). That’s a lot, but our goal is to more than triple that quantity by 2011, at which point we’ll be closer to the request numbers going into a service like Yahoo! Site Explorer.

The SEOmoz API currently powers some very cool stuff:

Open Site Explorer – my personal favorite way to get link information
The mozBar – the SERPs overlay, analyze page feature and the link metrics displayed directly in the bar all come from the API
Classic Linkscape – we’re on our way to transitioning all of the features and functionality in Linkscape over to OSE, but in the meantime, PRO members can get access to many more granular metrics through these reports
Dozens of External Applications – things like Carter Cole’s Google Chrome toolbar, several tools from Virante’s suite, Website Grader and lots more (we have an application gallery coming soon)

Each month, we repeat this process, learning big and small lessons along the way. We’ve gotten tremendously more consistent, redundant and error/problem free in 2010 so far, and our next big goal is to dramatically increase the depth of our crawl into those dark crevices of the web as well as ramping up the value and accuracy of our metrics.

We look forward to your feedback around this latest index update and any of the tools powered by Linkscape. Have a great Memorial Day Weekend!

Ali JalilPour June 30, 2024

0 0 3 minutes read

Mi Islita – Truly Advanced SEO

Click-Through Rate Isn’t Everything: 8 Ways to Improve Your Online Display Ads

Build an SEO Report in Seconds with Domain Overview

What Link Building Success Really Looks Like

Agency vs In-House vs Freelance SEO: The Endless Debate (starring Mr. Men)

I’m Being Outranked by a Spammer

Headsmacking Tip #2: Top Level Navigation Naming Conventions

Page Level Query Analysis at Scale with Google Colab, Python, & the GSC API [Video Instructions Included]

How to Optimize for Competitors’ Branded Keywords

5 Reasons Not to Put Viral Content on Mini-Sites

Apply the Golden Rule to Guest Posting

International SEO: Where to Host and How to Target

May 2010 Linkscape Update (and Whiteboard Explanations of How We Do It)

Ali JalilPour

Leave a Reply Cancel reply

Web hosting for SEO: Why it’s important

SEM career playbook: Overview of a growing industry

What Is SEO – Search Engine Optimization?

How to Use Keyword Explorer to Identify Competitive Keyword Opportunities

An Easy YouTube Framework: For Testing Keyword Success

Evidence of the Surprising State of JavaScript Indexing

How I Develop Successful Link Building Strategies for My Clients

Optimizing for AI Overviews

My Top 5 Local SEO and Marketing Takeaways From MozCon 2024

How I Develop Successful Link Building Strategies for My Clients

Top SEO Tips for 2024 — Whiteboard Friday

Intro to Python [Part 2]

Subscribe to our mailing list to get the new updates!

SEOmoz API Lab Experiment - AJAX'ed Page Comparison Tool

PageRank Sculpting and a New Way to Hide Links from GoogleBot's Eyes

Related Articles

Leave a Reply Cancel reply

Web hosting for SEO: Why it’s important

SEM career playbook: Overview of a growing industry

What Is SEO – Search Engine Optimization?

How to Use Keyword Explorer to Identify Competitive Keyword Opportunities

An Easy YouTube Framework: For Testing Keyword Success

Evidence of the Surprising State of JavaScript Indexing

How I Develop Successful Link Building Strategies for My Clients

Optimizing for AI Overviews

My Top 5 Local SEO and Marketing Takeaways From MozCon 2024

How I Develop Successful Link Building Strategies for My Clients

Top SEO Tips for 2024 — Whiteboard Friday

Intro to Python [Part 2]